Rank | Count | Beginning |
---|---|---|
13 | 18019 | Il |
1 | 15771 | La |
3 | 13348 | E |
69 | 9163 | Ma |
39 | 7218 | Non |
161 | 5993 | Per |
9 | 5436 | A |
50 | 5400 | I |
2 | 5216 | In |
18 | 4320 | Le |
117 | 3624 | Un |
78 | 3467 | Se |
162 | 3045 | Nel |
5 | 2975 | Anche |
32 | 2920 | Lo |
93 | 2671 | E' |
127 | 2586 | Una |
422 | 2414 | Sì |
308 | 2115 | Con |
354 | 2081 | Secondo |
45 | 1966 | Poi |
61 | 1795 | Da |
24 | 1769 | Dopo |
40 | 1753 | Come |
109 | 1734 | Al |
34 | 1631 | Sono |
23 | 1630 | Che |
288 | 1390 | Ora |
131 | 1345 | Gli |
35 | 1343 | Ci |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV